
clear
import delimited "ll_compare_fit.csv", varnames(1) clear

split v1, p("fit")
destring v12, gen(number)

* Figure A1: Fit of biterm topic models with different numbers of topics
* Notes: Lower absolute values reflect a better model fit. The log-likelihood refers to the sum of log-likelihoods of all biterms in a model.
twoway (scatter v2 number) ///
 (fpfit v2 number), scheme(s2mono) graphregion(color(white)) scale(1.1) ///
 ylabel(-9800000(500000)-7300000, angle(0) gmin gmax) xlabel(, grid) ///
 ytitle("Log-likelihood") xtitle(" " "Number of topics") legend(off)


